DROPS

Document

DOI: 10.4230/LIPIcs.ICDT.2024.14

The Importance of Parameters in Database Queries

Authors: Martin Grohe, Benny Kimelfeld, Peter Lindner, and Christoph Standke

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

We propose and study a framework for quantifying the importance of the choices of parameter values to the result of a query over a database. These parameters occur as constants in logical queries, such as conjunctive queries. In our framework, the importance of a parameter is its SHAP score. This score is a popular instantiation of the game-theoretic Shapley value to measuring the importance of feature values in machine learning models. We make the case for the rationale of using this score by explaining the intuition behind SHAP, and by showing that we arrive at this score in two different, apparently opposing, approaches to quantifying the contribution of a parameter. The application of the SHAP score requires two components in addition to the query and the database: (a) a probability distribution over the combinations of parameter values, and (b) a utility function that measures the similarity between the result for the original parameters and the result for hypothetical parameters. The main question addressed in the paper is the complexity of calculating the SHAP score for different distributions and similarity measures. We first address the case of probabilistically independent parameters. The problem is hard if we consider a fragment of queries that is hard to evaluate (as one would expect), and even for the fragment of acyclic conjunctive queries. In some cases, though, one can efficiently list all relevant parameter combinations, and then the SHAP score can be computed in polynomial time under reasonable general conditions. Also tractable is the case of full acyclic conjunctive queries for certain (natural) similarity functions. We extend our results to conjunctive queries with inequalities between variables and parameters. Finally, we discuss a simple approximation technique for the case of correlated parameters.

Cite as

Martin Grohe, Benny Kimelfeld, Peter Lindner, and Christoph Standke. The Importance of Parameters in Database Queries. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 14:1-14:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2024.14,
  author =	{Grohe, Martin and Kimelfeld, Benny and Lindner, Peter and Standke, Christoph},
  title =	{{The Importance of Parameters in Database Queries}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.14},
  URN =		{urn:nbn:de:0030-drops-197966},
  doi =		{10.4230/LIPIcs.ICDT.2024.14},
  annote =	{Keywords: SHAP score, query parameters, Shapley value}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2023.20

Probabilistic Query Evaluation with Bag Semantics

Authors: Martin Grohe, Peter Lindner, and Christoph Standke

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)

Abstract

We initiate the study of probabilistic query evaluation under bag semantics where tuples are allowed to be present with duplicates. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012). Due to potentially unbounded multiplicities, the bag probabilistic databases we discuss are no longer finite objects, which requires a treatment of representation mechanisms. Moreover, the answer to a Boolean query is a probability distribution over non-negative integers, rather than a probability distribution over {true, false}. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.

Cite as

Martin Grohe, Peter Lindner, and Christoph Standke. Probabilistic Query Evaluation with Bag Semantics. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2023.20,
  author =	{Grohe, Martin and Lindner, Peter and Standke, Christoph},
  title =	{{Probabilistic Query Evaluation with Bag Semantics}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.20},
  URN =		{urn:nbn:de:0030-drops-177636},
  doi =		{10.4230/LIPIcs.ICDT.2023.20},
  annote =	{Keywords: Probabilistic Query Evaluation, Probabilistic Databases, Bag Semantics}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2020.16

Infinite Probabilistic Databases

Authors: Martin Grohe and Peter Lindner

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)

Abstract

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.

Cite as

Martin Grohe and Peter Lindner. Infinite Probabilistic Databases. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 16:1-16:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{grohe_et_al:LIPIcs.ICDT.2020.16,
  author =	{Grohe, Martin and Lindner, Peter},
  title =	{{Infinite Probabilistic Databases}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{16:1--16:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.16},
  URN =		{urn:nbn:de:0030-drops-119400},
  doi =		{10.4230/LIPIcs.ICDT.2020.16},
  annote =	{Keywords: Probabilistic Databases, Possible Worlds Semantics, Query Measurability, Relational Algebra, Aggregate Queries}
}

Document

DOI: 10.4230/LIPIcs.CSL.2018.3

Climbing up the Elementary Complexity Classes with Theories of Automatic Structures

Authors: Faried Abu Zaid, Dietrich Kuske, and Peter Lindner

Published in: LIPIcs, Volume 119, 27th EACSL Annual Conference on Computer Science Logic (CSL 2018)

Abstract

Automatic structures are structures that admit a finite presentation via automata. Their most prominent feature is that their theories are decidable. In the literature, one finds automatic structures with non-elementary theory (e.g., the complete binary tree with equal-level predicate) and automatic structures whose theories are at most 3-fold exponential (e.g., Presburger arithmetic or infinite automatic graphs of bounded degree). This observation led Durand-Gasselin to the question whether there are automatic structures of arbitrary high elementary complexity. We give a positive answer to this question. Namely, we show that for every h >=0 the forest of (infinitely many copies of) all finite trees of height at most h+2 is automatic and it's theory is complete for STA(*, exp_h(n, poly(n)), poly(n)), an alternating complexity class between h-fold exponential time and space. This exact determination of the complexity of the theory of these forests might be of independent interest.

Cite as

Faried Abu Zaid, Dietrich Kuske, and Peter Lindner. Climbing up the Elementary Complexity Classes with Theories of Automatic Structures. In 27th EACSL Annual Conference on Computer Science Logic (CSL 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 119, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{abuzaid_et_al:LIPIcs.CSL.2018.3,
  author =	{Abu Zaid, Faried and Kuske, Dietrich and Lindner, Peter},
  title =	{{Climbing up the Elementary Complexity Classes with Theories of Automatic Structures}},
  booktitle =	{27th EACSL Annual Conference on Computer Science Logic (CSL 2018)},
  pages =	{3:1--3:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-088-0},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{119},
  editor =	{Ghica, Dan R. and Jung, Achim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CSL.2018.3},
  URN =		{urn:nbn:de:0030-drops-96701},
  doi =		{10.4230/LIPIcs.CSL.2018.3},
  annote =	{Keywords: Automatic Structures, Complexity Theory, Model Theory}
}

Search Results

Documents authored by Lindner, Peter

The Importance of Parameters in Database Queries

Abstract

Cite as

Probabilistic Query Evaluation with Bag Semantics

Abstract

Cite as

Infinite Probabilistic Databases

Abstract

Cite as

Climbing up the Elementary Complexity Classes with Theories of Automatic Structures

Abstract

Cite as

Thanks for your feedback!

Could not send message